{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summarizing documents with Kernel Memory\n",
    "\n",
    "This notebook shows how to leverage Kernel Memory pipelines to generate custom information such as a summaries.\n",
    "\n",
    "In KM terms, a Summary is a Synthetic Memory, which means it's some information derived from other memories. KM includes one example of Synthetic Memory generation, in the form of a Summarization Handler, that users can optionally use when ingesting documents. In fact, considering this handler, there are at least three ways to import a document:\n",
    "\n",
    "1. Import a document, creating memory records for each \"chunk\".\n",
    "2. Import a document, creating memory records for each \"chunk\", and generating a summary memory record.\n",
    "3. Import a document, creating only the summary synthetic memory record.\n",
    "\n",
    "If you add your custom pipeline handlers to KM, you can further customize what KM does with your documents. For instance you could:\n",
    "\n",
    "1. Use a Translation Handler to translate every document into other languages\n",
    "2. Use a Monitoring Handler to analyze memories and create or update some statistics stored in memory (or anywhere you like)\n",
    "3. Use a ChatLog Handler to detect customers having troubles with some product\n",
    "4. etc.\n",
    "\n",
    "Let's see how to create and retrieve summaries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "\n",
    "For our demo, we use also the \"dotenv\" nuget, to load our secret credentials from a `.env` file.\n",
    "Make sure you create your `.env` file, with your OpenAI API Key.\n",
    "\n",
    "> ```\n",
    "> OPENAI_API_KEY=<your OpenAI API key>\n",
    "> ```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "polyglot_notebook": {
     "kernelName": "csharp"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>dotenv.net, 3.1.3</span></li></ul></div></div>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "#r \"nuget: dotenv.net, 3.1.3\"\n",
    "\n",
    "dotenv.net.DotEnv.Load();\n",
    "var env = dotenv.net.DotEnv.Read();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's setup KM as usual:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "polyglot_notebook": {
     "kernelName": "csharp"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div></div><div></div><div><strong>Installed Packages</strong><ul><li><span>Microsoft.KernelMemory.Core, 0.29.240219.2</span></li></ul></div></div>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "info: Microsoft.KernelMemory.Handlers.TextExtractionHandler[0]\n",
      "      Handler 'extract' ready\n",
      "info: Microsoft.KernelMemory.Handlers.TextPartitioningHandler[0]\n",
      "      Handler 'partition' ready\n",
      "info: Microsoft.KernelMemory.Handlers.SummarizationHandler[0]\n",
      "      Handler 'summarize' ready\n",
      "info: Microsoft.KernelMemory.Handlers.GenerateEmbeddingsHandler[0]\n",
      "      Handler 'gen_embeddings' ready, 1 embedding generators\n",
      "info: Microsoft.KernelMemory.Handlers.SaveRecordsHandler[0]\n",
      "      Handler save_records ready, 1 vector storages\n",
      "info: Microsoft.KernelMemory.Handlers.DeleteDocumentHandler[0]\n",
      "      Handler 'private_delete_document' ready\n",
      "info: Microsoft.KernelMemory.Handlers.DeleteIndexHandler[0]\n",
      "      Handler 'private_delete_index' ready\n",
      "info: Microsoft.KernelMemory.Handlers.DeleteGeneratedFilesHandler[0]\n",
      "      Handler 'delete_generated_files' ready\n",
      "Kernel Memory ready.\n"
     ]
    }
   ],
   "source": [
    "#r \"nuget: Microsoft.KernelMemory.Core, 0.29.240219.2\"\n",
    "\n",
    "using Microsoft.KernelMemory;\n",
    "\n",
    "var memory = new KernelMemoryBuilder()\n",
    "    .WithOpenAIDefaults(env[\"OPENAI_API_KEY\"])\n",
    "    .Build<MemoryServerless>();\n",
    "\n",
    "Console.WriteLine(\"Kernel Memory ready.\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import document with custom pipeline steps\n",
    "\n",
    "Now, let's import and summarize a document.\n",
    "\n",
    "Note the `steps` parameter, that we use to customize the pipeline, asking to just generate a summary record, nothing else. \n",
    "The constant is just a shortcut for a list of strings: `{ \"extract\", \"summarize\", \"gen_embeddings\", \"save_records\" }`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "polyglot_notebook": {
     "kernelName": "csharp"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document imported and summarized.\n"
     ]
    }
   ],
   "source": [
    "await memory.ImportDocumentAsync(\"NASA-news.pdf\", documentId: \"doc001\", steps: Constants.PipelineOnlySummary);\n",
    "\n",
    "Console.WriteLine(\"Document imported and summarized.\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fetch and print the summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "dotnet_interactive": {
     "language": "csharp"
    },
    "polyglot_notebook": {
     "kernelName": "csharp"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "== NASA-news.pdf summary ==\n",
      "\n",
      "NASA is inviting media to see the test version of the Orion spacecraft and the recovery hardware that will be used for the Artemis II mission. The event will take place at Naval Base San Diego on August 2nd. Personnel from NASA, the U.S. Navy, and the U.S. Air Force will be available to speak with media. The recovery operations team is currently conducting tests in the Pacific Ocean to prepare for Artemis II, which will be NASA's first crewed mission under the Artemis program. The Artemis II crew will participate in recovery testing at sea next year.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "var results = await memory.SearchSummariesAsync(filter: MemoryFilters.ByDocument(\"doc001\"));\n",
    "\n",
    "foreach (var result in results)\n",
    "{\n",
    "    var summary = result.Partitions.First().Text;\n",
    "    Console.WriteLine($\"== {result.SourceName} summary ==\\n\\n{summary}\\n\");\n",
    "}"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".NET (C#)",
   "language": "C#",
   "name": ".net-csharp"
  },
  "language_info": {
   "name": "polyglot-notebook"
  },
  "polyglot_notebook": {
   "kernelInfo": {
    "defaultKernelName": "csharp",
    "items": [
     {
      "aliases": [],
      "languageName": "csharp",
      "name": "csharp"
     }
    ]
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
